Efficient Bulk Deletes for Multi Dimensionally Clustered Tables in DB2

نویسندگان

  • Bishwaranjan Bhattacharjee
  • Timothy Malkemus
  • Sherman Lau
  • Sean Mckeough
  • Jo-Anne Kirton
  • Robin Von Boeschoten
  • John Kennedy
چکیده

In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout or Bulk Deletes. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. The ability to efficiently handle the updates of RID indexes while doing Rollouts is a well known problem for database engines and its solution is very important for data warehousing applications. DB2 UDB V8.1 introduced a new physical clustering scheme called Multi Dimensional Clustering (MDC) which allows users to cluster data in a table on multiple attributes or dimensions. This is very useful for query processing and maintenance activities including deletes. Subsequently, an enhancement was incorporated in DB2 UDB Viper 2 which allows for very efficient online rollout of data on dimensional boundaries even when there are a lot of secondary RID indexes defined on the table. This is done by the asynchronous updates of these RID indexes in the background while allowing the delete to commit and the table to be accessed. This paper details the design of MDC Rollout and the challenges that were encountered. It discusses some performance results which show order of magnitude improvements using it and the lessons learnt.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Performance Study of Rollout for Multi Dimensional Clustered Tables in DB2

In data warehousing applications, the ability to efficiently delete large chunks of data from a table is very important. This feature is also known as Rollout. Rollout is generally carried out periodically and is often done on more than one dimension or attribute. DB2 UDB V8.1 introduced a new physical clustering scheme called Multi Dimensional Clustering (MDC) which allows users to cluster dat...

متن کامل

Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2

We have introduced a Multi-Dimensional Clustering (MDC) physical layout scheme in DB2 version 8.0 for relational tables. MultiDimensional Clustering is based on the definition of one or more orthogonal clustering attributes (or expressions) of a table. The table is organized physically by associating records with similar values for the dimension attributes in a cluster. Each clustering key is a...

متن کامل

Efficient Bulk Deletes in Relational Databases

Many applications require that large amounts of data are deleted from the database – typically, such bulk deletes are carried out periodically and involve old or out-of-date data. If the data is not partitioned in such a way that bulk deletes can be carried out by simply deleting whole partitions, then most current database products execute such bulk delete operations very poorly. The reason is...

متن کامل

Autonomous Data Ingestion Tuning in Data Warehouse Accelerators

The IBM DB2 Analytics Accelerator (IDAA) is a state-of-the art hybrid database system that seamlessly extends the strong transactional capabilities of DB2 for z/OS with very fast processing of OLAP and analytical SQL workload in Netezza. IDAA copies the data from DB2 for z/OS into its Netezza backend, and customers can tailor data maintenance according to their needs. This copy process, the dat...

متن کامل

Using the SAS/ACCESS® Libname Technology to Get Improvements in Performance and Optimizations in SAS/SQL Queries

This paper highlights the new features in the SAS/ACCESS libname engines that, when used judiciously, can improve overall engine scalability in the areas of loading/extraction, ASYNC I/O, and SQL-based query optimizations. The new loading/extraction engine features are: • Multi-row reads • DBKEY • Bulk loading • Multi-row writes The new ASYNC I/O features are: • PreFetch • SAS server task switc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007